Robustness of sentence length measures in written texts
نویسندگان
چکیده
منابع مشابه
Word-length entropies and correlations of natural language written texts
We study the frequency distributions and correlations of the word lengths of ten European languages. Our findings indicate that a) the word-length distribution of short words quantified by the mean value and the entropy distinguishes the Uralic (Finnish) corpus from the others, b) the tails at long words, manifested in the high-order moments of the distributions, differentiate the Germanic lang...
متن کاملThe Effect of Length of Pre-task Planning Time on Discourse-analytic Measures and Analytic Ratings in L2 Written Narratives
The favorable gains gleaned from the provision of pre-task planning time (PTP) have struck a chord with SLA researchers as they try to manipulate task features to promote language production and development. In a similar vein, the present study is a two-fold attempt to first compare the effect of the length of pre-task planning time on discourse-analytic measures in narrative written production...
متن کاملAutomatic Structuring of Written Texts
This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signa...
متن کاملDiscourse Segmentation of German Written Texts
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document st...
متن کاملDetecting Sentence Boundaries in Sanskrit Texts
The paper applies a deep recurrent neural network to the task of sentence boundary detection in Sanskrit, an important, yet underresourced ancient Indian language. The deep learning approach improves the F scores set by a metrical baseline and by a Conditional Random Field classifier by more than 10%.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Physica A: Statistical Mechanics and its Applications
سال: 2018
ISSN: 0378-4371
DOI: 10.1016/j.physa.2018.04.104